From Perceptual Relations to Scene Gist Recognition

نویسندگان

  • Ilan Kadar
  • Ohad Ben-Shahar
چکیده

The ability to recognize visual scenes quickly and accurately is highly constructive for both biological and machine vision. In this work we study the process of scene gist recognition from a novel point of view and investigate whether prior knowledge of the perceptual relations between the different scene categories may help facilitate better computational models for scene gist recognition. We first introduce a psychophysical paradigm that probes human scene gist recognition and extracts perceptual relations between scene categories. Then, we show that these perceptual relations do not always conform the semantic structure between categories. Next, we incorporate the obtained perceptual relations into a computational classification scheme, which takes inter-class relationships into account to obtain better scene recognition regardless of the particular descriptors with which scenes are represented. We present such improved recognition performance using several popular descriptors, we discuss why the contribution of inter-class perceptual relations is particularly pronounced for under-sampled training sets, and we argue that this mechanism may explain the ability of the human visual system to perform well under similar conditions. Finally, we introduce an online experimental system for obtaining perceptual relations for large collections of scene categories. 1. Perceptual Relations A key issue in the context of scene gist recognition is perceptual relations (as opposed to semantic relations; see below), a possibility that has been rarely considered either in the perceptual literature or the computational literature [7, 2, 8]. However, even intuitively, when our visual system observes a bedroom scene for a fraction of a second and “deliberates” how to categorize it, what possibly comes to mind in addition to “bedroom” are perhaps classes like “living room” or “kitchen”. It appears as if our visual system does not even consider possibilities such as “coast” or “highway”, or more generally, scenes which are perceptually “distant” from the observable reference class. Put differently, prior knowledge about the perceptual relations between the different categories of scenes may help facilitate more accurate and more efficient scene gist recognition. Knowledge of such relationships could also partly explain the fact that humans are often able to learn and process hundreds of scene categories from very few training examples while computational models usually need at least tens of training examples per category before achieving reasonable recognition performance. Exploring relations between categories is not new and was recently promoted by exploiting WordNet as a semantic relationships database for object recognition [3]. Indeed, semantic relationships can be extracted quite conveniently from WordNet. Still, we found several examples to suggest that semantic relationships between categories do not necessarily agree with their perceptual relationships. For example, our experimental setup reveals that the “highway” category is perceptually closer to “coast” than to “kitchen”, although semantically the opposite holds. Once relations between visual categories are considered based on perceptual criteria, two questions immediately arise. First, how can perceptual difference or distance between scene categories be determined or inferred directly from human vision? Put differently, can the perceptual distance between categories be measured psychophysically in a robust and unbiased way? Second, once determined, how could these perceptual relations be incorporated into a computational classification scheme. We introduce a psychophysical paradigm where we briefly present two natural scene stimuli simultaneously and ask human observers whether they belong to the same scene category or not (i.e., same/different forced choice task). Collected from 79 human subjects, we analyze subjects’ average performance over to provide an unbiased objective measure regarding the perceptual “distance” between the different scene categories. In particular, we calculate subjects’ probability to respond Different for each pair of categories. Since this probability is expected to increase when such judgment is easier, and since the latter case is expected when scenes become more “perceptually different”, this probability is termed as the “perceptual distance” (PD)

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A perceptual paradigm and psychophysical evidence for hierarchy in scene gist processing.

What is the order of processing in scene gist recognition? Following the seminal studies by Rosch (1978) and Tversky and Hemmenway (1983) it has been assumed that basic-level categorization is privileged over the superordinate level because the former maximizes both within-category similarity and between-category variance. However, recent research has begun to challenge this view (Oliva & Torra...

متن کامل

Superordinate Level Processing Has Priority Over Basic-Level Processing in Scene Gist Recognition

By combining a perceptual discrimination task and a visuospatial working memory task, the present study examined the effects of visuospatial working memory load on the hierarchical processing of scene gist. In the perceptual discrimination task, two scene images from the same (manmade-manmade pairing or natural-natural pairing) or different superordinate level categories (manmade-natural pairin...

متن کامل

The role of gist in scene recognition

Studies of change blindness suggest that we bring only a few attended features of a scene, plus a gist, from one visual fixation to the next. We examine the role of gist by substituting an original image with a second image in which a substitution of one object changes the gist, compared with a third image in which a substitution of that object does not change the gist. Small perceptual changes...

متن کامل

Comparing rapid scene categorization of aerial and terrestrial views: A new perspective on scene gist.

Scene gist, a viewer's holistic representation of a scene from a single eye fixation, has been extensively studied for terrestrial views, but not for aerial views. We compared rapid scene categorization of both views in three experiments to determine the degree to which diagnostic information is view dependent versus view independent.We found large differences in observers' ability to rapidly c...

متن کامل

Recognizing the gist of a visual scene: possible perceptual and neural mechanisms

We try to understand the basics of human image processing from a gist recognition per7 spective. Because the gist is only a subset of the image’s information, we think that it is extracted with help of interpretation (feedback). In a perceptual section we list possible mech9 anisms that the interpretation process uses to determine the gist: in addition to the commonly known local-to-global perc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013